METHOD AND SCALABLE ARCHITECTURE FOR PARALLEL 
CALCULATION OF THE DCT OF BLOCKS OF PIXELS OF DIFFERENT 
SIZES AND COMPRESSION THROUGH FRACTAL CODING 

Field of the Invention 

The invention relates in general to digital 
processing systems for recording and/or transmitting 
pictures, and more in particular to systems for 
compressing and coding pictures by calculating the 
discrete cosine transf orm-tlJCT) of blocks of pixels of a 
picture. The invention is particularly useful in video 
coders according to the MPEG2 standard though rt is 
applicable also to other systems. 

Background of the Invention 

The calculation of the discrete cosine transform 
(DCT) of a pixel matrix of a picture is a fundamental 
step in processing picture data. A division by a 
quantization matrix is performed on the results of the 
discrete cosine transform for reducing the amplitude of 
the DCT coefficients, as a precondition to d^ta 
compression which occurs during a coding phase, 
according to a certain transfer protocol of video data 
to be transmitted or stored. Typically, the calculation 
of the discrete cosine transform is carried out on 
blocks or matrices of pixels, in which a whole picture 
is subdivided for processing purposes. 



Increasing speed requisites of picture processing 
systems for storage and/or transmission, imposes the use 
of hardware architectures to speed up various processing 
steps among which, primarily, the step of discrete 
cosine transform calculation by blocks of pixels. Use 
of hardware processing imposes a pre-def inition of few 
fundamental parameters, namely the dimensions of the 
blocks of pixels into which a picture is subdivided to 
meet processing requisites. 

Such a pre-def inition may represent a heavy 
constraint that limits the possibility of optimizing the 
processing system, for example a MPEG2 coder, or its 
adaptability to different conditions of use in terms of 
different performance requisites. It is also evident 
the enormous economic advantage in terms of reduction of 
costs of an integrated data processing system that may 
be programmed to calculate in parallel the DCT on 
several blocks of pixels of size selectable among a 
certain number of available sizes. 

Summary of "the Invention 

It is evident that a need exists for a method and of 
a hardware architecture for calculating the discrete 
cosine transform (DCT) on a plurality of blocks of 
pixels, in parallel, which provides for the scalability 
of the size of the blocks of pixels. For example, the 
calculation of the discrete cosine transform (DCT) 
either for one block of 8x8 pixels, or four blocks of 
4x4 pixels in parallel, or for sixteen blocks of 2x2 
pixels in parallel, operating a selection of the block's 
size. 

Scalability of the size of the block of pixels and 
the possibility of performing the calculation of the 
discrete cosine transform in parallel on blocks of 
congruently reduced size compared to a certain maximum 
block's size, by a hardware structure is als'o 
instrumental of the implementation of highly efficient 
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"hybrid" picture compression schemes and algorithms. 
For example, by virtue of the scalability of the block 
size and of the ability to calculate in parallel the DCT 
on more blocks, it is possible, according to the present 
5 invention, to implement a fractal coding applied in the 
DCT domain rather than in the space domain of picture 
data, as customary. 

Therefore, another important aspect of the invention 
is a new picture data compressing and coding method that 
10 practically is made possible by a hardware structure 
calculating the DCT on blocks of scaleable size and 
which includes 

subdividing a picture by defining two 
UJ distinct types of subdivision blocks: a first type, 

15 of N/i^N/i dimension called range blocks that are not 

overimposable one on another, and a second type, of 

N^N dimension, called domain blocks, that are 

transferable by intervals of N/i pixels and 

•overimposable one on another (by transferring on the 

2 0 original picture a window that identifies a domain 

block by an interval equivalent to the horizontal 
and/or vertical dimension of a range block) ; 

calculating the discrete cosine transform (DCT) of 
the 2^ range blocks and of a relative domain block in 

25 parallel; 

classifying the transform range blocks' according 

to their relative complexity calculated by summing 
the three AC coefficients; 

applying the fractal transform in the DCT domain 

3 0 to the data of range blocks whose complexity exceeds a 

pre-defined threshold and storing only the DC 
coefficient of the range blocks with a complexity 

lower than said threshold, identifying a relative 
domain block belonging to the range block being 

35 transformed that produces the best fractal 
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approximation of the same range block and calculating 
its discrete cosine transform; 

calculating a difference picture between each 
range block and its fractal approximation; 
5 quantizing said difference picture in the DCT 

domain by using a quantization table predisposed in 
function of human sight characteristics; 

coding said quantized difference picture by a 
method based on the probability of the quantization 
10 coefficients; 

storing or transmitting the coding code for each 
range block compressed in the DCT domain and the DQ 
coefficient of each uncompressed range block. 
By indicating with F^{u,y), the DCT. of a generic range 
15 block, in the domain of the DCT, the AC coefficients 
indicate the block's complexity. For the case of N=^8, 
the four coefficients at the top left of the block are: 
I; the DC coefficient that occupies the position 00; 
• and 

20 the three AC coefficients that occupy the 

positions 01, 10 and 11, respectively. 
The AC coefficients are used to decide to which a 
certain range block belongs: if the sum of their 
absolute values is less that a determined threshold T, 
25 the block range in question is classified as a "low 

activity" block, on the contrary, if the sum is equal or 
greater than T, the range block is classified as a "high 
activity" block. 

For a low activity range block, the AC coefficients 
3 0 are small and therefore they may be omitted without 

significantly affecting fidelity: in this case the block 
may be approximated by storing only its DC coefficient. 

For a high activity range block, the progression of 
fractal coding of the invention, includes searching two 



appropriate linear transform, for example, rotations, <f>, 
overturns, r, or the like and a doma,in block DCT, of 
which being defined by Fj^(u,v) , which at least 
approximately satisfy the following equation: 

F^(m,v)=<OK(«,v)) 

FrM=t{F^M) . (1) 

Having so identified the domain block most similar 
or homologous to the range block that is being 

processed, its parameters (u,v) ,t,(I> are stored. The 
difference picture between the rangre block and its 
fractal position is then calculated: 

Z)(i/,v) = F^(t/,v)-«*(F^(w,v)) 

^(",v)=i^^(t/,v)-r(F^(w,v)) (2) 

By quantizing the difference picture D(u,v), 

Dq (w, v) = INTEG[d{u, v)/ Q{u, v)] {3) 

is .obtained, where: 

DQ(u,v) is the quantized difference picture in the 

domain of DCT; 

Q(u,v) is a quantization table designed by 

considering human sight characteristics; 

INTEG is a function that approximates its argument 
to the nearest integer; 

After quantization, the majority of the Dq(u,v) 
coefficients are null. Therefore, it is easy to design a 
coding, for example Huffman coding, based on the 
probabilities of the coefficients. Finally, the code to 
be recorded or transmitted is stored. The compression 
procedure terminates when each rangre block has been 

coded . 
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Brief Description of the Drawings 

The different aspects and implementations of the 
scaleable architecture for calculating the discrete 
cosine transform of the invention as well as of the 
5 method of compression and fractal coding, will be more 
easily understood through the following detailed 
description of an embodiment of the architecture of the 
invention and of the different functioning modes 
according to a selection of the size selection of the 
10 blocks of pixels into which the picture is divided by- 
referring to the annexed drawings, wherein: 

Figure 1 is a block diagram of a coder effecting 

hybrid compression based on fractal coding and OCT, 
according to the present invention; 
15 Figure 2 is a flow chart of the parallel computation 

of the DCT of sixteen blocks of 2x2 -pixel size; 

Figure 3 illustrates the architecture for parallel 

computation of sixteen 2x2 DCTs; 

'Figure 4 shows the arrangement of the input data for 
l^y 2 0 calculating the sixteen 2x2 DCTs; 

^0 Figure 5 shows the PROCESS phase of the calculating 

procedure of sixteen 2x2 DCTs; 

Figure 6 illustrates the architecture for parallel 
computation of four 4x4 DCTa; 

2 5 Figure 7 shows the arrangement of the input data for 

calculating the four 4x4 DCTs; 

Figure 8 shows the PROCESS phase of the calculating 

procedure of four 4x4 DCTs; 

Figure 9 illustrates the architecture for parallel 

3 0 computation of an 8x8 DCT; 

Figure 10 shows the arrangement of the input data for 

calculating an 8x8 DCT; 

Figure 11 shows the PROCESS phase for the calculating 

procedure of an 8x8 DCT; 
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Figure 12 shows the scaleable hardware architecture 
of the invention for calculating an 8x8 DCT or four 4x4 
DCTa in parallel or sixteen 2x2 DCTs in parallels- 
Figure 13 shows the INPUT structure of the scaleable 

architecture of the inventions- 
Figure 14 shows the PROCESS structure for the 

scaleable architecture of the inventions- 
Figure 15 shows the functional schemes of the blocks 

that implement the PROCESS phase in the scaleable 

architecture of the inventions- 
Figure 16 is a detailed scheme of the QA blocks- 
Figure 17 is a detailed scheme of the QB blocks- 
Figure 18 is a detailed scheme of the QC block ; 
Figure 19 is a detailed scheme of the QD blocks- 
Figure 20 is a detailed scheme of the QE 'blocks- 
Figure 21 is a detailed scheme of the QF blocks- 
Figure 22 is a detailed scheme of the QG block ; 
Figure 23 illustrates an implementation of the ORDER 

phase in the scaleable architecture of the invention; 

and 

Figure 24 illustrates an implementation of the OUTPUT 
phase in the scaleable architecture of the invention. 

Detailed Description of the Preferred Embodi ments 

Though referring in some of the schemes illustrated 
in the figures to a particularly significant and 
effective implementation of the architecture of parallel 
computation of the discrete cosine transform^ (DCT) on 
blocks of pixels of scaleable size, which comprises a 
compression phase for the fractal coding of the picture 
data, it is understood that the method and architecture 
of parallel -calculation of the discrete cosine 
transformed (DCT) of a bidimensional matrix of input 
data by blocks of a scaleable size, provide for an 



exceptional freedom in implementing particularly 
effective compression algorithms by exploiting the 
scalability and the possibility of a parallel 
calculation of DCT. 

The partitioning steps and the calculation of the 
discrete cosine transform of a bidimensional matrix of 
input data will be described separately for each size of 
range block, according to an embodiment of the 
invention, starting from the smallest block dimension of 
2x2 for which the DCT calculation is performed in 
parallel, up to the maximum block dimension of 8x8. 
This description of an architecture scaleable according 
to needs by changing the value of the global variable 
size, will follow. 

The procedure for a parallel DCT computation of the 
invention may be divided in distinct phases: 

INPUT phase 

PROCESS phase 

ORDER phase 

OUTPUT phase. 
Each phase is hereinbelow described for each case 
considered. 

The DCT operation may be defined as follows. 
For an input data matrix j^j^,^-[jc^^}o^/,7 , the 

output matrix = [y^./il^ ^ 'WjW ^ A/^ -1 , is defined by: 



ym 

where : 



1 n 

—7= per « = 0 



\per\<.n<>N -\ 

For convenience, assume that W=2'' , where i is an 
integer and i^l. Let's remove e(m) , £(n) , and the 
normalization value 2/N from equation (4), in view of 



the fact that they may be reintroduced in a successive 
step. Therefore, from now on, the following simplified 
version of equation (4) will be used: 



^1^1 r(2/ + l>n 1 r (27+.l>" J 

/=0 j=0 L J L 



(5) 



Parallel eomputation o f sixteen 2x2 DCTa 

For W=2 equation (5) becomes: 

11 r(2/ + l>« 1 r (27 + l> J (e) 

/=0y=0 L -» J L 

The flow graph for a 2x2 DCT is shown in Fig. 2, in 
which A=B=C=1 and the input and output data are the 
pixels in the positions (0, 0) , (0, 1) , (1, 0) , (1, 1) . 

Let us now consider how to calculate in parallel 
sixteen 2x2 DCTs in which an 8x8 block is subdivided. 
The procedure is divided in many steps, a global view of 
which is depicted in Fig. 3. This figure highlights the 
transformations performed on the 2x2 block constituted 
by the pixels (0, 6) , (0, 7) , (1, 6) , (1, 7) . 

The pixels that constitute the input block are 
ordered in the INPtJT phase and are processed in the 
PROCESS phase to obtain the coefficients of the sixteen 
bidimensional DCTs, or briefly 2-D DCTs, on four samples, 
for example, the 2-D DCT of the block (0,1) constituted 
by: 

(1[0] ,m[0] ,n[0] ,o[0l} is (a [0] ,b [0] , c [0] , d[0] } 
The coefficients of the 2-D DCT are re-arranged in 
the ORDER phase into eight vectors of eight components. 
For example the coefficients {a [0] ,b [0] , c [0] , d [0] } 
constitute the vector I'. The vectors thus obtained 
proceed to the OUTPUT phase to give the coefficients of 
the 2x2 DCT, constituting the output block. 
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INPUT phase 

The pixels of each block (1, j) , with 0< i :^ 1 and 0:^ j 
<3, are ordered to the eight -component vectors 1, m, n, 
o in the following manner: 

the pixels that occupy the position (0,0) in the 
block constitute the vector 1; 

the pixels that occupy the position (0,1) in the 
block constitute the vector m; 

the pixels that occupy the position (1,0) in the 
block constitute the vector n; 

the pixels that occupy the position (1,1) in the 
block constitute the vector o. 

Similarly, the pixels of each block (i,j) , with 2 
:^ i ^ 3 and 0 ^ i ^3, are ordered to constitute the 
eight -component vectors p, q, r, a in the following 
manner : 

the pixels that occupy the position (0,0) in the 
block constitute the vector p; 

the pixels that occupy the position (0,1) in the 
block constitute the vector g-; 

the pixels that occupy the position (1,0) in the 
block constitute the vector jt; 

the pixels that occupy the position (l',0) in the 
block constitute the vector a. 

This arrangement is detailed in Fig. 4, It should be 
noted, for example, that the pixels of the block (0,3) 
will constitute the third component of the I, m, n, o 

vectors . 
PROCESS phase 

The PROCESS phase includes calculating in parallel 

the sixteen 2-D DCTs by processing the eight -component 



vectors 1, m, . . . , s as shown in Fig. 5. It is noted for 
example, that the coefficients of the 2-D DCT of the 
block (0,3) will constitute the third component of the 
vectors a, jb, c, d of Fig, 3. 
ORDER phase 

The ORDER phase includes arranging the output 
sequences of the eight 2-D DCTs in eight vectors I', m', 
- . . , fl ' thus defined: 



/'= 


b[0] 
c[0] 

40] 
a[l] 
b[l] 
c[l] 

4i]_ 


m' = 


U2] 
b[2] 
c[2] 
42] 
43] 
b[3] 
c[3] 

.43] 


n' = 


b\4] 
c[4] 
d[4] 

45] ' 

b[5] 

c[5] 

45]: 


o' = 


b[6] 
c[6] 
^[6] 
«[7] 

M7] 
c[7] 
y[7] 


P' = 


\e[0]- 

f\o] 
Mo] 

/[I] 
.Ml]. 




le[2]l 

JxAl 
g{2] 
h{2] 
e{3] 
f[3] 
g{3] 
h[3] 


r' = 


re[4]- 
/[4] 

h[4] 
e[5] 

f[5] 
gl5] 

Ms] 


s' = 


em 

/[6] 

m 

e{l] 
/[7] 

41] 







It is noted, for example, that the coefficients of 
the 2-D DCT of the block (0,3) will constitute the 
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components 4, 5, 6, 7 of the vector m' 
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OUTPUT Phase 

This phase includes rearranging the output data 
starting from the eight -component vectors a, b, 
a 64 component vector defined as follows, is 
constructed: 

l[0] /[I] /[4] /[5] m[0] m[l] m[4] m[5]\ 
l[2] l[3] l[6] l[7] m[2] m[3] m[6] m[7] 
«[0] n[l] n[4] n[5] o[o] o[l] o[4] o[5] 
n[2] n[3] n[6] /»[?] o[2] o[3] o[6] 
p[0] p[i\ p[4] p[5] q[0] 



y[0] y[l] ... y[7] 
y[54] y[55] - y[63l 



7[1] g[4] 



o[7] 

4s] 



p[2] p[3] p[6] p[7] g[2] q[3] g[6] g[7] 



r[0] r[l] r[4] r[5] s[o] 
lr[2] r[3] r[6] r[7] s[2] 
Paral lel computation of four 4x4 DCTa 
For N 4, equation (5) becomes 

If: 



.[1] 
43] 



.[4] 
.[6] 



.[5] 
s[7] 



(B) 













yj 
y^. 


• ^64*1 ~ 


A 



where : 



Yi = tyi,o/yi,i/yi,2/yi,3l ' 

{/0,rg.O = to = -f^J-k/to 

W,i )?=o = {^0,0 . ^1,1 . . ^3,3 } > 
)f=o = {^1.0 » ^3,1 . . 2,3 } ' 



r7; 
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it may be demonstrated that: 

^16*1 =(^4)16^16*1 



(9) 



where ; 



The matrices 



(^0)4 



(^^0)4 
-(^2)4 

(^.)4 



(^0)4 

-(^.)4 
-(^3)4 



(10) 



(Hi)^, i =0, 1, 2, 3 are as follows: 



(^0)4 = 



0 
1 

0 
0 



0 
0 

1 

0 



te)4 = 



0 
0 

2 
0 



0 

}_ 

2 
0 

1^ 
2 



1 
0 

0 

0 



0' 
0 
0 
1 

0 
\^ 

2 
0 

_\ 
2J 



' (^3)4 = 



0 

}_ 

2 
0 

0 

0 
0 

0 

J_ 

.2 



1 
0 

2 
0 

0 
0 

J_ 
2 
0 



0 

}_ 

2 
0 

j_ 
2 

0 
_[ 

2 
0 



0 
0 

\_ 
2 
0 



1 
0 

J_ 
2 



- 0 



The monodimensional DCT, or briefly the 1-D DCT, is 
expressed by the matrix (1-D DCT) ^ given by: 



1 

cl 



1 



1 



cl -cl 
cl -c\ -c\ 



cl 



-cl 



cl 



1 

-cl 



where C 



From these equations it may be said that the 
computation of one 4x4 DCT may be divided into two 
steps : 

computation of four 1-D DCT, each performed on an 
appropriate sequence of four pixels. 

computation of the 2-D DCT starting from the four 
1-D DCT. 

These two steps are carried out in a similar manner, 
and are implemented with the same hardware that is used 



twice- Let us consider now how to calculate in parallel 
four 4x4 DCTs. The total 64 samples are obtained from 
the 4x4 blocks into which an 8x8 block is subdivided. 
The procedure is subdivided in distinct phases to each 
of which corresponds an architectural block. A whole 
view is shown in Fig. 6. This figure highlights the 
transformations carried out on each 4x4 block. 
INPUT Phase 

The pixel of each quadrant , 0 :^ i,j < 1 are 

ordered to constitute the vectors: 









x^'j 




•^2,0 




•^3,0 


JJ _ 
^1 


x''j 

x''j 

x''j 
- 3,3_ 


a' J _ 
, ^2 — 


x''j 
^3.1 

x''J 

x''j 
. 2,3_ 


b'J - 


•^0,1 

*3,2 

x''j 
/1,3 _ 


, ^1 - 


*2,1 

x''J 
■^1.2 

x''j 
. 0.3_ 



After arranging the data in 16 four -component 
vectors, we define the eight -component vectors 1, m, n, 
o, constituted by the first, second, third and fourth 
components, respectively, of the initial vectors 
constituted by the pixels of the 00 and 01 quadrants, 
and the p, q, r, s, vectors constituted by the first, 
second, third and fourth components, respectively, of 
the initial vectors constituted by the pixels of the 10 
and 11 quadrants. Precisely: 



MoT-' 












^3[3r 


A,[or 




^3[ir 
















53[2f0 




53[3f'0 




m = 




n = 


A,[2r 


o = 












AM'' 




A,[3r 






B,[ir 




B,[2r 






_B,[0f'' 




Mr 






BA^r 
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P = 











Ai[3\' 










^3 1-5 J 


53[lf° 




B3[2V 




53[3f'0 






























^3[3]''' 










B,[3r 










.Bd^r 



^3 



B['J 



AdoV 
BdoV 
BdoV 

Ador 

bM'' 

[BdoFi 

By taking into account the way in which the vectors 
A[*^ , Al^J , B^'-^ , B\'J are defined, the arrangement detailed 
in Fig. 7 is obtained. It should be noted that in this 
figure the original 8x8 block is subdivided in the four 
4x4 quadrants, within each quadrant (i, j) , the pixels 

belonging to the respective vectors A['^ , A^-^ , 
have different shadings in the figure • 

According to what has been described above, the 
computation of a 4x4 DCT may be subdivided in two 
stages: consequently, the PROCESS phase that is the only 

phase in which arithmetical operations are performed, is 
done twice : 

a first time, to compute in parallel the sixteen 
1-D DCTs; 

a second time, to compute in parallel four 4x4 DCT 

starting from the coefficients of the 1-D DCTs. 

The variable stage indicates whether the first or 
second calculation stage is being performed. 

During the INPUT phase the variable stage is updated 
to the value 0 . 

At the input in the PROCESS phase, there are 64 input 
MUXes that are controlled by the variable stage. Each MUX 
receives two inputs: 

a pixel of the original picture, coming from the 

INPUT phase (this input is selected when stage = 0) ; 

a coefficient of a 1-D DCT, coming from the ORDER 
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phase (this input is selected when stage = 1) . 
PROCESS phase 

This phase includes processing the 1, m, a 

vectors as shown in Fig. 8. In this figure the following 
symbols are used: 



2Ci X L 



8x8 



(Hi), 



per stage = 0 
per stage = 1 



(11) 



B = 



ICixl 



8x8 



0 



per stage = 0 
per stage = 1 



(12) 



C = 



2C4 X /g><8 

(^2)4 
0 



0 

(^2)4 



per stage = 0 
per stage - 1 



(13) 



t = 



(14) 



1 per stage - 0 

2 per jra^e = 1 

At the output of the PROCESS structure there are 64 
DEMUXes controlled by the variable stage. The DEMUX 
address the data according to two conditions: 

if stage = 0, the input datum to each DEMUX is a 
coefficient of a 1-D DCT; therefore the datum must be 
further processed and, for this purpose, is conveyed 
to the ORDER phase; 

if stage = 1, the input datum to each DEMUX is a 
coefficient of a 2-D DCT; therefore the datum must not 
be processed further and therefore is conveyed to the 
OUTPUT phase . 

ORDER Phase 

The ORDER phase includes arranging the output 
sequence of the eight 1-D DCTs in eight I', m' , a' 
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vectors, thus defined: 



10 



^3 



15 



, fi: 



20 



25 



30 



' fO,0' 

Jo 
Jo 



n — 



0,1 



Jo 
Jo 



f2 



1.0 



-a[0]\ 
b[0] 
c[0] 

40] 
a[4] 
b[4] 
c[4] 
_44]\ 

'o[2]] 
b[2] 
c[2] 
42] 
a[6] 
b[6] 
c[6] 

-e[0]- 

/[o] 

m 

e[4] 
/[4] 

lh[4l 

-e[2Y 
f[2] 
g[2] 
h[2] 
e[6] 
/[6] 
^[6] 



0 = 



yo,o- 



fx 



0.1 



f'. 



1.1 



-41]- 
b[l] 

c[l] 

^[1] 

a[5] 
b[5] 
c{5] 

.m 

b[3] 
c[3] 
d[2] 
a[l] 
b[l] 
c[7] 

.^[711 

-e[l]- 

/[l] 

m 

e{5] 
/[5] 
^[5] 
h{5]. 

/[3] 

m 

e{l] 
/[7] 
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After the ORDER phase the variable stage is updated 
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to the value 1. The output data from the ORDER phase are 
sent to the PROCESS phase. 
OUTPUT phase 

This phase includes rearranging the data originating 
from the second (stagre = 1) execution of the PROCESS 
step: starting from these data, which constitute the 
eight -component vectors: a, . . . , h, the output block 

Y^*^ is thus 
defined: 



Ao] >{i] 

_){54 }{55] 



^63] 



■40] 40] 40] 40] 4)] /[o] M Mo] 

^[3] c[3] 43] e[3] /[3] ^S] ^3] 

/I4] g[4] 44] 44] 44] 44] 44] 

i i i i : i : : 

.47] /[7] 47] 47] 47] 47] 47] 
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The main differences between the hardware for 
calculating the four 4x4 DCTs and the hardware needed 
for the sixteen 2x2 DCTs are the following: 

the ordering sequences of the pixels of the block 
of the original picture depend on the chosen DCT size; 

to execute the sixteen 2x2 DCTs the PROCESS step 
must be carried out only once; instead, to execute the 
four 4x4 DCTs the PROCESS step must be repeated two 
times ; 

the operations executed during the PROCESS phase 
are not always the same for the two cases. 
Computation of an 8X8 DCT 

For N = 8 equation (5) becomes: 



(2y + l)« 



16 



(16) 



Putting : 
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> ^64x1 = 








/7. 



where : 



yi = \yisi yi.\ ■■■ yipV 
W to = i5C7-(H.,- to = ocr({B,, to), 

to = ^cr(U,, to). to - i'crk, to). 
{/2,.to=^cr({^5,,to . k.to'C'^^^'to). 
{/vto=^crk<to). K.to = WK,to . 

{A/}J=0 ={^0.0 JC2,2 ^3.3 ^4,4 ^^5.5 ^6,6 ^7,7 / 

Wi}J=o ={*1,0 ^4,1 ^7.2 ^5.3 ^2.4 *0.5 *3,6 *6.7 * 

{^5./}J=o ={*2.0 *3,2 >^1,3 ^6,4 ^5 ^0,6 *5,7 ' 

W/}J=o = fa.O ^5,1 *1.2 ^7.3 J^0,4 *6,5 ^2,6 ^^4^ / 

{^7,/}J=o ={^4.0 ^2.1 JC6,2 ^0,3 ^7,4 ^1,5 *5,6 *3,7 ' 

fe,/}J=o = fe.O ^0.1 ^4.2 ^6.3 ^1.4 JC3,5 *7.6 ^2.7}/ 

{^3./iJ=o=Ko ^3,1 JC0,2 ^2,3 *5.4 ^7.5 ^4.6 ^1.7 } ' 

{^1,/}J=0 = {^7.0 Jf6,l ^5.2 *4.3 ^^3,4 ^2,5 ^1,6 ^0.7 } 
it may be demonstrated that : 

^64x1 =(^8)64^64x1 (17) 

where : 
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The 1-D DCT is expressed by the matrix: 
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3 5 Where we put: 



„ = COS( — TT) 

n 

From the above equations it is evident that the 
computation of an 8x8 DCT may be subdivided in two 
stages : 

calculating eight 1-D DCTs, each for a certain 
sequence of eight pixels; 

calculating the 2-D DCT, starting from the eight 
1-D DCTs. 

These two stages may be executed through the same 
hardware using it twice. The processing is subdivided 
in different steps, to each of which corresponds an 
architectural block. A whole view of the hardware is 
shown in Fig . 9 . 
INPUT Phase 

The pixels of the 8x8 input block are ordered to 
constitute the eight -component vectors 1, m, n, o, p, q, 
r, 31 



'Am 








"^l[7]" 


^3[0] 




A^m 




A^m 






Asm 




AsU] 


Am 




Aim 






Bjm 


, m = 


57 [1] 




57 [7] 


Bsm 










sm 








B3[7] 


5,[0] 








.Bl[7]_ 



By taking into account the way in which the vectors 
Aj, Aj, As, Aj, By, Bs, B3 , are defined, we obtain the 
detailed arrangement of Fig, 10. It should be noticed 
that in this figure the pixels belonging to the vectors 
Aj^, A3, A5, Ay, By, Bs, B3, B^ are countersigned by 
di f f er ent shadings . 

As shown above, the computation of an 8x8 DCT may be 
subdivided into two stage. The PROCESS step, which is 
the only phase in which mathematical operations are 
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performed, is performed twice: 

the first time, to compute in parallel sixteen 1-D 
DCTs; 

the second time, to compute the 8x8 DCT starting 
from the coefficients of the sixteen 1-D DCTs. 
The variable stagre indicates whether the first or 
second calculation step is being performed. During the 
INPUT phase, the variable stage is updated to the value 
0 . 

At the input of the PROCESS structure, there are 64 
MUXes controlled by the variable stage. Each MUX receives 
two inputs: 

a pixel of the original picture, originating from 
the INPtJT phase (this input is selected when stage = 
0) ; 

a coefficient of a 1-D DCT, originating from the 
ORDER phase (this input is selected when stage = 1) . 
PROCESS Phase 

This phase includes processing the I, m, s 

vectors as shown in Fig. 11, In this figure, the 
following symbols are used: 



A = 



2Cg X /g^g 

2Cg x/gxg 
-(^6)8 

(^4)8 



per stage = 0 
per stage = 1 

per stage = 0 
per stage = 1 

per stage = 0 
per stage = 1 



(19) 



(20) 



(21) 



1 per stage = 0 

2 per stage = 1 

At the output of the PROCESS structure there are 64 
DEMUXes controlled by the variable stage. The DEMUXes 
address the data according to two possibilities: 



(22) 



26 



10 



15 



20 



25 



30 



if stage = 0, the input datum to each DEMUX is a 
coefficient of a 1-D DCT; therefore, the datum must be 
further processed and, for this purpose, is sent to 
the ORDER phase; 

if stage = 1, the input datum to each DEMUX is a 
coefficient of a 2-D DCT; therefore, the datum does 
not need any further processing and therefore is sent 
to the OUTPUT phase . 

ORDER phase 

This phase includes arranging the output sequence of 
the eight 1-D DCTs in eight I', in',,..,s' vectors, thus 
defined : 

-a[l] 



l'=fo = 



a[0] 
b[0] 
c[0] 
d[0] 
e[0] 
/[O] 
^[0] 
h[0] 



b[l] 
c[l] 
d[\] 
e[l] 

m 

LM1]J 



a[7] 
b[7] 
c[7] 
d[7] 
e[7] 
/[7] 

gP] 
h[7] 



Following the ORDER phase the variable stage is 
updated to the value 1. The output data from the ORDER 
phase are sent to the PROCESS phase . 
OUTPUT phase 

This phase includes rearranging the data originating 
from the second execution of the PROCESS step (that is, 
with stage = I) : starting from these data, which 
constitute the eight -component vectors a, Jb, h, the 

output block Y^»^ defined as follows is constituted: 



y[0] y[l] 



y[54] y[55] 



y[7] 



y[63] 



a[0] b[0] c[0] d[0] e[0] /[O] g[0] h[0] 
a[7] b[7] c[7] d[7] e[7] /[7] g[7] h[7] 



(23) 



The main differences between the hardware that 



calculates an 8x8 DCT and the hardware that calculates 
the four 4x4 DCTs are : 

the sequences into which must be arranged the 
pixels of a block of the original picture depend on 
the chosen size of the DCT; 

the operations executed during the PROCESS step 
are not always the same for the two cases , 
Procedure for calculating the DCT for blocks of 
scaleable size (8x8 DCTa. 4x4 DCTa and 2x2 DCTa) 

From the above described procedures, an algorithm for 
calculating a chosen one of 8x8 DCT or four 4x4 DCTs (in 
parallel) or sixteen 2x2 DCTs (in parallel) may be 
derived. The selection is made by the user by assigning 
a certain value to the global variable size: 



The procedure is subdivided in various phases 
(regardless of the value of the variable size) , to each 
of which corresponds an architectural block. A whole 
view is shown in Fig. 12. Each phase has been organized 
in order to provide for partial results corresponding to 
the chosen value, minimizing redundancies. Sometimes the 
operations performed are different depending on the 
value of size. In these cases, the architecture 
considers a MUX whose control input is size. Let us 
examine now the various phases and highlight the 
differences in respect to the architectures that have 
already been described above: 
INPUT phase 

The object of this phase, depicted in Fig. 13, is to 
arrange the data to allow the computation starting from 
the arranged data of the 1-D DCTs. This is done by 
inputting the luminance values of the pixels (8x8 
matrix) and arranging them in eight -component vectors 1, 
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m , • • • / 8 , 

For example ; 



i[ol=. 



0,0 



1[1]= 



fjc, 2 per size = 0 or 1 



0 per size = 2 

s[7] = 



^0.7 per size = 0 
X4 7 per size = 1 
7 per size = 2 
PROCESS phase with stage = 0 

This phase includes calculating in parallel the eight 
1-D DCTs by processing the vectors 1, . . • , s as shown 

in Fig. 14. In this figure may be observed the use of 
16 MUXes controlled by the variable size. The eight 
MUXes on the left serve to bypass the operations required 
for the computation of the 8x8 DCT. Thus, the bypass 
occurs when size = 1 or 2, while it does not occur for 
size = 0, The eight MUXes on the right serve to output 

only the result that corresponds to the pre-selected 
value of size. 



t = 



1 per stage = 0 

2 per stage = 1 



(24) 



0 



J5 = 



2Cg X /g^g 



^8x8 

0 



2Cg X /g^g 



'8x8 



-(^6)8 

-(^3)4 0 
. 0 -(^3)4 



per stage, size = (0,0) or (0,1) 

per stage, size = (0,2) or (1,2) 
per stage, size = (1,0) 

per stagey size = (1,1) 



per stage, size = (0,0) or (0,1) 

per stage, size = (0,2) or (1,2) 
per stage, size = (1,0) 



(25) 



per stage, size = (1,1) 



(26) 



'8x8 



(^2)4 
0 



0 

(^.)4. 



per stage, size = (0,0) or (0,1) 

per stage, size = (0,2) or (1,2) 
per stage, size = (1,0) 



per stage, size = (1,1) 
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^^\6 ^ -'8x8 



per stage ~ 0 
per stage = 1 

per stage = 0 
per stage = 1 

per stage = 0 
per stage = 1 



(28) 



(29) 



(3 0) 



G = 



2C|5 X /g^g 



per stage = 0 
/7er stage = 1 

The scheme in Fig. 14 may be subdivided into the 
architectural blocks shown in Fig. 15. For example, 
two vectors each of eight components (each component 
being a pixel, that may have been processed already) are 
input to the QA block, which outputs two vectors of 
eight components: the first vector is the sum of the two 
input vectors, while the second vector is the difference 
between the two input vectors that is successively 
processed with the linear operator A. It should be 
noted that the operators A, B, C, D, E, F, G are 8x8 
matrices . 

By considering a lower level of generalization, the 
QA, QB, QC blocks are shown in detail in Figures 16, 17 
and 18, respectively. In these figures the MUXee are 
controlled by three bits, which correspond to the 
variable stage (which may take the value 0 or 1, and 
thus is represented by a bit) and the variable size 
(which may take the value 0, 1 or 2 , and thus is 
represented by two bits) . The blocks QD, QE, QF, QG are 
shown in detail in Figures 19, 20, 21 and 22, 
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respectively. In these figures the MUXes are controlled 
by a bit that corresponds to the variable stage. 
ORDER Phase 

The ORDER phase, depicted in Fig. 23, includes 
arranging the output sequences of the eight 1-D DCTs in 



eight vectors 1', m' , 



. / s ' , For example : 

no] = a[0]; 
/1l] = 6[0]; 



e[0] per size = 0 

a[4] per size = 1 
^[1] per size = 2 



^'[7] = /z[7]; 

OUTPUT Phase 

This phase, depicted in Fig. 24, includes 
rearranging the data coming from the second (that is 
with stagre = I) execution of the PROCESS step. Starting 
from these data, constituting the eight -component 
vectors a, b, Jd, the output block yi^^W is 

constituted. 



For example : 

M0] = 



am 
m 

m 
m 



per size = 
per size 



0 or l1 

• = 2 J 



dm 

47] 



per size = Ool] 
per size = 2 J 

per size — 0 
per size = 1 
per size = 2 
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A functional block diagram of a picture 
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Description of the Drawings 

A functional block diagram of a picture 
compressor- coder according to the present invention 
may be represented as shown in Ficrure 1 . 

Essentially, the compressor- coder performs a 
hybrid compression based on a fractal coding in the 
DCT domain. This is made possible by the peculiar 
architecture of parallel calculation of the DCT on 
blocks of scaleable size of pixels, as described 
above . 

Hereinbelow, the remaining figures are described 
one by one: 

Figure 2 is a flow graph of the 2x2 DCT generating 
block . 

This block is the "base" block that is repeatedly 
used in the PROCESS phase of all the NxN DCTs, where N 
is a power of 2 . 

In particular: 

the flow graph for a 2x2 DCT is shown in Fig. 

2, wherein A = B = C = 1 and the input and output 

data are pixels in the positions (0,0), (0,1), 

(1,0), (1,1); 

for sixteen 2x2 DCTs, the inputs and the 

outputs are eight -component vectors and the 

following symbols are used, considering A = B 



= C 



= I. 



for four 4x4 DCTs the inputs and outputs are 
eight -component vectors and the following symbols 
are used: 



2C^ X /g^g 

■(^1)4 0 ] 
. 0 (//.)J 



per stage = 0 
per stage = 1 
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X /g^g per stage = 0 

0 -(^3)4 



/?er stage = 1 



X /g^g /?er j/ar^e = 0 



(34) 

per stage = 1 



for an 8x8 DCT, the inputs and outputs are 
eight -component vectors and the flowing symbols 
are used: 





per stage - 0 






per stage = 1 


(35) 


2Cg'x/8x8 


per stage = 0 




-(^6)8 


per stage = 1 
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2Ci X /g,g 


per stage = 0 




(^4)8 


per stage = 1 


(37) 



In the scaleable architecture for calculating 
an 8x8 DCT or four 4x4 DCTs (in parallel) or 
sixteen 2x2 DCTs (in parallel) , the inputs and the 
outputs are vectors of eight components and the 
following symbols are used: 

X / g^g per {stage, size) = (0,0) or (0,1) 

^ 8x8 per (stage, size) = (0,2) or (1,2) 

per (stage, size) (1,0) 
per (stage, size) (1,1) 



(38) 
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B = 



-(^6)8 

'-(^3)4 0 

. 0 -(^3). 



8x8 



'8x8 



iff A 

'(ff^l 



per {stage, size) = (0,0) or (0,1) 
per (stage, size) = (0,2) (1,2) 
per (stage, size) (1,0) 
per (stage, size) (1,1) 

/7er (stage, size) = (0,0) or (0,1) 
per (stage, size) = (0,2) or (1,2) 
per (stage, size) (1,0) 
per (stage, size) (1,1) 



(4o; 



0 (//, 

Fiqi4z:<° 3 illustrates the architecture for calculating 
sixteen 2x2 DCTs in parallel . 

•The pixels that constitute the input block are 
ordered during the INPUT phase and processed during the 
PROCESS phase to obtain the coefficients of the sixteen 
2-D DCTs on four samples. For example, the 2-D DCT of 
the block (0,1) constituted by 

{1[0] ,m[0] ,n[0] ,o[0j} is {a [Oj , b [0] , c [0] , d [0] } . 

The coefficients of the 2-D DCTe are rearranged 
during the ORDER phase in eight vectors of eight 
components. For example the coefficients 
{a[0] ,b[0] ,c[0] ,d[0] } will constitute the vector 1* . 

The sixteen two-component vectors so obtained are 
sent to the PROCESS phase to obtain the coefficients of 
the 2x2 DCT. These coefficients, reordered during the 
OUTPUT phase , constitute the output block . • 

Figure 4 shows the ordering of the input data for 
calculating sixteen 2x2 DCTs. 

This figure shows the way the pixels of the 8x8 input 



f 



block are ordered to constitute the vectors of 8 
components 1, in, s. In each quadrant with 0 

^ i,j ^ 3, the pixels belonging to the vectors are 
symbolized by different shadings. For example: 

'/Jk }*=0 ~ {^0,0 » ^1,1 } 

From each of these vectors, the components with the 
same index (that is the pixels with the same column 
index) will form a vector of four components. For 
example the vector 1 is constituted by the elements 
{Al [0] , Bl [0] } . 

Therefore, each pixel of the 8x8 input block will 
constitute a component of one of the vectors 1, m, n, o. 

Pi <I, X^i B. 

Ficfure 5 shows the process phase for calculating 
sixteen 2x2 DCTs. 

This phase includes processing the eight -component 
vectors I, m, s. The PROCESS phase, which is the 

only phase in which arithmetical operations are 
performed, is executed only once to calculate in 
parallel the sixteen 2-D DCTs. 

Ficfure 6 illustrates the architecture for calculating 
four 4x4 DCTs. 

The pixels that constitute the input block are 
ordered in the INPUT phase and processed in the PROCESS 
phases to obtain the coefficients of the sixteen 1-D 
DCTs on 4 samples. For example, the 1-D DCT of the 
sequence {1 [0] ,m [0] , n [0] , or [0] } is 
{^[0] ,b[0] ,c[0] ,d[0] } . 

The coefficients of the 1-D DCTa are reordered in the 
ORDER phase in 8 vectors of eight components. For 
example the coefficients {a [0] ,b [0] , c [0] , d [0] } will 
constitute the vector 1 ' . 

The 4 four -component vectors so obtained are sent to 
the PROCESS phase to obtain the coefficients of the 4x4 



DCT. These coefficients, reordered in the OUTPUT phase, 
constitute the output block. 

Figure 7 shows the arrangement of the input data for 
calculating four 4x4 DCTs. 

This figure shows how the pixels of the 8x8 input 
block are ordered to constitute the eight -component 
vectors 1, m, s. 

In each quadrant with 0 < i,j < 3, the pixels 

belonging to the different vectors have different 
shadings. For example: 



From each of these vectors, the components with the 
same index (that is, the pixels with the same column 
index) will f oirm a vector of four components . For 
example the vector 1 is constituted by the elements 
{Al [0] , A3 [0] , B3 [0] , Bl [0] } . 

The outcome is that each pixel of the input 8x8 block 
will constitute one component of one of the vectors I, 
m, n, o, p, q, r, b. 

Figure 8 depicts the PROCESS phase for calculating 
the four 4x4 DCTs. 

This phase includes processing the eight -component 
vectors: I, m, s. 

The PROCESS phase, which is the only phase wherein 
arithmetical operations are performed, is carried out 
twice : 

the first time (stagre = 0) , to calculate in 
parallel the sixteen 1-D DCTs; 

the second time (stagre = 1) , to calculate the 8x8 
DCT starting from the coefficients of the 1-D DCTs. 
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Figure ? illustrates the architecture for calculating 
an 8x8 DCT. 

The pixels that constitute the input block are 
ordered during the INPUT phase and are processed in the 
PROCESS phase to obtain the coefficients of the eight 1- 
D DCTs on 8 samples. For example, the 1-D DGT of the 

sequence {l[0],m[0], ... , e [0] } is {a[0],b[0], 

,h[0]}. 

The coefficients of the 1-D DCTs are rearranged 
during the ORDER phase in 8 vectors of eight components. 

For example the coefficients {a[0],b[0], ,h[7]} 

will constitute the 1' vector. 

The 8 eight -component vectors so obtained are sent to 
the PROCESS phase to obtain the 8x8 DCT coefficients. 
These coefficients, rearranged during the OUTPUT phase, 
constitute the output block. 

Flc[ure 10 shows the arrangement of the input data for 
calculating an 8x8 DCr. 

This figure shows how the pixels of the input 8x8 
block are arranged to constitute the 8 eight -component 
vectors: I, jn, s. The pixels belonging to the 

vectors Al, A3, A5 , A7 , B7 , B5, B3, Bl are symbolized 
with different shadings, for example: 
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{■^1./ }/.0 ~ {*0.0 > ^1.1 » ^2.2 » ^3.3 » •*4,4 » ^5.5 > ^6.6 '^l.j} 

From each of these vectors, the components with the 
same index (that is, the pixels with the same column 
index) will form a vector of eight components. For 
example, the vector 1 is <j:onstituted by the elements 
(Al [OJ , A3[0] , . . . , B1[0]}. 

The result is that each pixel of the input 8x8 block 
will constitute a component of one of the vectors 1, m, 
a, o, p, q, r, a. 

Fiqurg 11 depicts the PROCESS phase for calculating 
an 8x8 DCT. 

This phase includes processing the eight -component 
vectors I , m, . . , , a . 

The PROCESS phase in which arithmetical operations are 
performed is executed twice: 

the first time (etagre=0) , to calculate in parallel 
the sixteen 1-D DCTs; 

the second time (stagre=l) , to calculate the 8x8 
DCT starting from the coefficients of the 1-D DCTs. 
In Fig, 11 the following symbols have been used: 

2C« X / g„s per stage = 0 
(^2), per stage = 



A = 




2C,' X / per stage = 0 

-{H,\ per stage ^^^^ 

per stage = 0 

per stage = 1 ' (47) 

1 per stage — 0 

2 per stage = {' ^^^^ 
Figure 12 illustrates a scaleable architecture for 

calculating an 8x8 DCT or four 4x4 DCTs or sixteen 2x2 
DCTs. 

The pixels that constitute the input block are ordered 
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C = 
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during the INPUT phase and processed during the PROCESS 
phase, which calculates: 

the 1-D DCTa (for stagre = 0, that is for the 8x8 
DCT, and for stagre = 1, that is for the 4x4 DCTs; 

the 2-D DCTs for stagre = 2 directly, that is for 
the 2x2 DCTs; 

When stage = 0 and stage = 1 the coefficients are 
then rearranged in the ORDER phase in 8 eight -component 
vectors, which are sent to the PROCESS phase to obtain 
the coefficients of the 2-D DCT. These coefficients, 
rearranged in the OUTPUT phase, constitute the output 
block . 

If stage = 2 the coefficients are transmitted 
directly to the OUTPUT phase, where they are rearranged 
to constitute the output block. 

Figure 13 depicts the INPUT phase for a scaleable 
architecture . 

The inputs are the 64 pixels that constitute the 
input block. 

The arrangement of the inputs is operated through the 
MUXes controlled by the size variable. 

The 64 outputs are the 8 vectors of eight components 

Figure 14 depicts the PROCESS phase for a scaleable 
architecture . 

This phase includes calculating in parallel the eight 
1-D DCTs by processing the vectors I, m, s as shown 

in Fig. 11. 

In this figure we may notice that the use of 16 MUXes 
controlled by size. 

The eight MUXes on the left serve to bypass the 
necessary operations only for calculating the 8x8 DCT; 
therefore, the bypass takes place for stage = 1 or 2, 
while it does not occur when stage = 0. 
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The eight MUXea on the right serve to output only the 
result corresponding to the pre-selected size. 
In Fig. 14 the following symbols are used: 
per stage = 0 
2 per stage = 1 ' 
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4 = 
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/7cr (stage, size) = (0,0) or (0,1) 
per (stage, size) = (0,2) or (1,2) 
/7er (stage, size) (1,0) 
per (stage, size) (1,1) 

per (j/a^e, j/2e) = (0,0) or (0,1) 
per (stage, size) = (0,2) or (1,2) 
per (stage, size) (1,0) 
per (stage, size) (1,1) 

per (j/or^e, jize) = (0,0) or (0,1) 
per (j/oge, j/ze) = (0,2) or (1,2) 
per (stage, size) (1,0) 
per (stage, size) (1,1) 
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Figure 15 is a block diagram of the structure that 
implements the PROCESS phase. 

For example, the QA block receives as an input two 
vectors of eight components (each component is a pixel, 
that may have already been processed) and outputs two 
vectors of eight components. The first vector is the sum 
of the two input vectors, while the second vector is the 
difference between the two input vectors, successively 
processed with the linear operator A, It should be 
noticed that the A, B, C, D, E, F, G operators are 8x8 
matrices . 

Figure 16 is a detailed scheme of the QA block. 

This scheme shows the details of the single 
components of the two input vectors and the arithmetical 
operators (adders etc.) which act on each component. The 
results are sent to the MUXes depicted on the right side 
of the figure, each of which, depending on the control 
variables stage and size, select only one result, which 
constitutes one component of the output vector. 

Ficrure 17 is a detailed scheme of the QB block. 

This scheme shows the details of the single 
components of the two input vectors and the arithmetical 
operators (adders etc.) which act on each component. The 
results are sent to the MUXes depicted on the right side 
of the figure, each of which, depending on the control 
variables stage and size, select only one result, which 
constitute a component of the output vector. 

Figure 18 is a detailed scheme of the QC block. 

This scheme shows the details of the single 
components of the two input vectors and the arithmetical 
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operators (adders etc.) acting on each component. The 
results are sent to the MUXes depicted on the right side 
of the figure, each of which, depending on the control 
variable stage and size, select only one result, which 
5 constitute one component of the output vector. 

Figure 19 is a detailed scheme of the QD block. 
This scheme shows the details of the single 
components of the two input vectors and the arithmetical 
operators (adders etc.) which act on each component. The 
10 results are sent to the MUXes depicted on the right side 
^ of the figure, each of which, depending on the control 

.^3 variable stage and size, select only one result, which 

S .. s 

^ constitute a component of the output vector, 

p Figure 20 is a detailed scheme of the QE block. 

^ 15 This scheme shows the details of the single 

:,p components of the two input vectors and the arithmetical 

operators (adders etc.) which act on each component. The 
results are sent to the MUXes depicted on the right side 
^.^^ of the figure, each of which, depending on the control 

20 variable stage, select only one result, which constitute 
^ one component of the output vector. 

Figure 21 is a detailed scheme of the QF block. 
This scheme shows the details of the single 
components of the two input vectors and the arithmetical 
25 operators (adders etc.) which act on each component. The 
results are sent to the MUXes depicted on the right side 
of the figure, each of which, depending on the control 
variable {\em stage} select only one result, which 
constitute a component of the output vector. 
3 0 Figure 22 is a detailed scheme of the QG block. 

This scheme shows the details of the single 
components of the two input vectors and the arithmetical 
operators (adders etc.) which act on each component. The 
results are sent to the MUXes depicted on the right side 
35 of the figure, each of which, depending on the control 



variable stage, select only one result, which constitute 
a component of the output vector. 

Figure 23 depicts the ORDER phase for the scaleable 
architecture . 

The inputs are constituted by the 64 pixels after 
they have been processed through the PROCESS phase. 

The inputs arrangement is effected by the MUXes 
controlled by the variable size. 

The 64 outputs are the components of the eight - 
component vectors 1, m, . . . , s. 

Figure 24 depicts the OUTPUT phase for the scaleable 
architecture . 

The inputs are constituted by the 64 2-D DCT 
coefficients. The input arrangement is effected by the 
MUXes controlled by the variable size. 

The 64 outputs are the pixels that constitute the 
output block . 



